{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Identify Items That Use Insecure URLs\n",
    "\n",
    "> * 👟 Ready To Run!\n",
    "* 🗄️ Administration\n",
    "* 📦 Content Management\n",
    "\n",
    "Items of type WebMap, WebScene, or App contain collections of layers, basemaps, and other external services hosted on ArcGIS Online/Server. These services can be connected to via `http://` or `https://`, with HTTPS being the more secure protocol since it encrypts the connection. __It is recommended that all service URLs use the `https://` (or say, SSL) protocol__.\n",
    "\n",
    "This notebook will search through all WebMap/WebScene/App Items in a portal/organization, identifying the 'insecure' ones if one or more service URLs use `http://`. These items will be displayed in this notebook, persisted in `.csv` files, and can have the `potentially_insecure` tag added to them."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "toc": true
   },
   "source": [
    "<h1>**Table of Contents**<span class=\"tocSkip\"></span></h1>\n",
    "<div class=\"toc\"><ul class=\"toc-item\"><li><span><a href=\"#Identify-Items-That-Use-Insecure-URLs\" data-toc-modified-id=\"Identify-Items-That-Use-Insecure-URLs-1\">Identify Items That Use Insecure URLs</a></span><ul class=\"toc-item\"><li><span><a href=\"#Configure-Behavior\" data-toc-modified-id=\"Configure-Behavior-1.1\">Configure Behavior</a></span></li><li><span><a href=\"#Detecting-http-vs-https\" data-toc-modified-id=\"Detecting-http-vs-https-1.2\">Detecting http vs https</a></span><ul class=\"toc-item\"><li><span><a href=\"#WebMaps\" data-toc-modified-id=\"WebMaps-1.2.1\">WebMaps</a></span></li><li><span><a href=\"#WebScenes\" data-toc-modified-id=\"WebScenes-1.2.2\">WebScenes</a></span></li><li><span><a href=\"#Apps\" data-toc-modified-id=\"Apps-1.2.3\">Apps</a></span></li></ul></li><li><span><a href=\"#Output-CSV-Files\" data-toc-modified-id=\"Output-CSV-Files-1.3\">Output CSV Files</a></span></li><li><span><a href=\"#Miscellaneous-Functionality\" data-toc-modified-id=\"Miscellaneous-Functionality-1.4\">Miscellaneous Functionality</a></span></li><li><span><a href=\"#main()\" data-toc-modified-id=\"main()-1.5\">main()</a></span></li><li><span><a href=\"#Post-Processing\" data-toc-modified-id=\"Post-Processing-1.6\">Post Processing</a></span></li></ul></li><li><span><a href=\"#Conclusion\" data-toc-modified-id=\"Conclusion-2\">Conclusion</a></span><ul class=\"toc-item\"><li><ul class=\"toc-item\"><li><span><a href=\"#Rewrite-this-Notebook\" data-toc-modified-id=\"Rewrite-this-Notebook-2.0.1\">Rewrite this Notebook</a></span></li><li><span><a href=\"#Related-Notebooks\" data-toc-modified-id=\"Related-Notebooks-2.0.2\">Related Notebooks</a></span></li></ul></li></ul></li></ul></div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To get started, import the necessary libraries and connect to our GIS:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import csv, os\n",
    "import time\n",
    "from IPython.display import display, HTML\n",
    "import json\n",
    "import pandas\n",
    "import logging\n",
    "log = logging.getLogger()\n",
    "\n",
    "from arcgis.map import Map, Scene\n",
    "from arcgis.gis import GIS\n",
    "\n",
    "# login with your admin profile\n",
    "gis = GIS(profile=\"Home\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Configure Behavior\n",
    "\n",
    "Now, let's configure some variables specific to our organization that will tell our notebook how we want it to run. With the default `CHECK_ALL_ITEMS` set to `True`, this notebook will apply this check to all items in an organization/portal. If you would instead prefer to only apply this check to certain groups of items, set `CHECK_ALL_ITEMS` to `False`, then set `GROUP_NAMES` to a list of group name strings.\n",
    "\n",
    "Modify the below cell to change that default behavior."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set to `True` if you would like to check ALL items in an org/portal\n",
    "CHECK_ALL_ITEMS = True\n",
    "# If `CHECK_ALL_ITEMS` is `False`, then it will check all items in these groups\n",
    "CHECK_THESE_GROUPS = ['group_name_1', 'group_name_2']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, let's specify what types of items we want to test. By default, this notebook will check `WebMap`, `WebScene`, and any `App` items.\n",
    "\n",
    "Modify the below cell to change that default behavior."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "CHECK_WEBMAPS = True\n",
    "CHECK_WEBSCENES = True\n",
    "CHECK_APPS = True"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, let's specify what kind of behavior we want when we come across an insecure item. This notebook will automatically sort and display the insecure and secure items, but we can also configure if we want to add a `potentially_insecure` tag to all insecure items.\n",
    "\n",
    "The default behavior is __NOT__ to add the tag. Modify the below cell to change that default behavior."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "TRY_TAG_INSECURE_ITEMS = False"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Detecting http vs https\n",
    "\n",
    "A core component of this notebook will be detecting if a URL is `http://` or `https://`. We will do this by creating helper functions that use the built-in string library to see what the URL string starts with."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "def is_https(url):\n",
    "    return str(url).startswith(\"https:/\")\n",
    "\n",
    "def is_http(url):\n",
    "    return str(url).startswith(\"http:/\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### WebMaps\n",
    "\n",
    "This code cell defines a function that will test all URLs in a web map item; it will return the URLs that use `https://` and the URLs that use `http://`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "def test_https_in_webmap(webmap_item):\n",
    "    \"\"\"Takes in an `Item` class instance of a Web Map Item.\n",
    "    Sorts all operational layers and basemap layers based on if\n",
    "    they are http or https, returns a tuple of \n",
    "    (https_urls, http_urls), with each being a list of URLs\n",
    "    \"\"\"\n",
    "    https_urls = []\n",
    "    http_urls = []\n",
    "    wm = Map(item=webmap_item)\n",
    "\n",
    "    # Concatenate all operational layers and basemap layers to one list\n",
    "    all_layers = []\n",
    "    for operationalLayer in wm.content.layers:\n",
    "        if hasattr(operationalLayer, 'layers'):\n",
    "            for layer in operationalLayer.layers:\n",
    "                all_layers.append({\"url\":layer.url})\n",
    "        else:\n",
    "            all_layers.append({\"url\":operationalLayer.url})\n",
    "    if hasattr(wm.basemap.basemap, 'baseMapLayers'):\n",
    "        all_layers += wm.basemap.basemap.baseMapLayers\n",
    "\n",
    "    # Test all of the layers, return the results\n",
    "    for layer in [layer for layer in all_layers \\\n",
    "                  if hasattr(layer, 'url')]:\n",
    "        if is_https(layer.url):\n",
    "            log.debug(f\"    [✓] url {layer['url']} is https\")\n",
    "            https_urls.append(layer.url)\n",
    "        elif is_http(layer.url):\n",
    "            log.debug(f\"    [X] url {layer['url']} is http\")\n",
    "            http_urls.append(layer.url)\n",
    "    return (https_urls, http_urls)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### WebScenes\n",
    "\n",
    "This code cell defines a function that will test all URLs in a web scene item; it will return the URLs that use `https://` and the URLs that use `http://`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [],
   "source": [
    "def test_https_in_webscene(webscene_item):\n",
    "    \"\"\"Takes in an `Item` class instance of a web scene item.\n",
    "    Sorts all operational layers and basemap layers based on if\n",
    "    they are http or https, returns a tuple of \n",
    "    (https_urls, http_urls), with each being a list of URLs\n",
    "    \"\"\"\n",
    "    https_urls = []\n",
    "    http_urls = []\n",
    "    ws = Scene(item=webscene_item)\n",
    "\n",
    "    # Concatenate all operational layers and basemap layers to one list\n",
    "    all_layers = []\n",
    "    for operationalLayer in ws.content.layers:\n",
    "        if hasattr(operationalLayer, 'layers'):\n",
    "            for layer in operationalLayer.layers:\n",
    "                all_layers.append({\"url\":layer.url})\n",
    "        else:\n",
    "            all_layers.append({\"url\":operationalLayer.url})\n",
    "    for bm_layer in ws.basemap.basemap.get('baseMapLayers', []):\n",
    "        all_layers.append(bm_layer)\n",
    "\n",
    "    # Test all of the layers, return the results\n",
    "    for layer in [layer for layer in all_layers \\\n",
    "                  if layer.get('url', False)]:\n",
    "        if is_https(layer.get('url', False)):\n",
    "            log.debug(f\"    [✓] url {layer['url']} is https\")\n",
    "            https_urls.append(layer['url'])\n",
    "        elif is_http(layer.get('url', False)):\n",
    "            log.debug(f\"    [X] url {layer['url']} is http\")\n",
    "            http_urls.append(layer['url'])\n",
    "    return (https_urls, http_urls)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Apps\n",
    "\n",
    "This code cell defines a function that will test all URLs in an app item; it will return the URLs that use `https://` and the URLs that use `http://`.\n",
    "\n",
    ">__Note__: App items don't have as standardized of JSON format as WebMaps and WebScenes. Therefore, the logic used to detect URLs in App Items will test every nested value in the dictionary returned from a `get_data()` call."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_values_recurs(dict_):\n",
    "    \"\"\"Helper function to get all nested values in a dict.\"\"\"\n",
    "    output = []\n",
    "    if isinstance(dict_, dict):\n",
    "        for value in dict_.values():\n",
    "            if isinstance(value, dict):\n",
    "                output += get_values_recurs(value)\n",
    "            elif isinstance(value, list):\n",
    "                for entry in value:\n",
    "                    output += get_values_recurs({\"_\":entry})\n",
    "            else:\n",
    "                output += [value,]\n",
    "    return output\n",
    "\n",
    "def test_https_in_app(app_item):\n",
    "    \"\"\"Takes in an `Item` class instance of any 'App' Item.\n",
    "    Will call `.get_data()` on the Item, and will search through\n",
    "    EVERY value nested inside the data dict, sorting each URL\n",
    "    found to either `https_urls` or `http_urls`, returning the \n",
    "    tuple of (https_urls, http_url)\n",
    "    \"\"\"\n",
    "    https_urls = []\n",
    "    http_urls = []\n",
    "    all_values = get_values_recurs(app_item.get_data())\n",
    "    for value in all_values:\n",
    "        if is_https(value):\n",
    "            https_urls.append(value)\n",
    "        elif is_http(value):\n",
    "            http_urls.append(value)\n",
    "    return (https_urls, http_urls)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The previously defined `test_https_...()` functions all follow a similar prototype of returning a tuple of `(https_urls, http_urls)`. We can therefore define a helper function that will sort for us and call the correct function, based on the `item.type` property and the previously defined configuration variables."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [],
   "source": [
    "def test_https_for(item):\n",
    "    \"\"\"Given an `Item` instance, call the correct function and return \n",
    "    (https_urls, http_urls). Will return (None, None) if the item type \n",
    "    is not supported, or if configured to not check that item type.\n",
    "    \"\"\"\n",
    "    if (item.type == \"Web Map\") and CHECK_WEBMAPS:\n",
    "        return test_https_in_webmap(item)\n",
    "    elif (item.type == \"Web Scene\") and CHECK_WEBSCENES:\n",
    "        return test_https_in_webscene(item)\n",
    "    elif (\"App\" in item.type) and CHECK_APPS:\n",
    "        return test_https_in_app(item)\n",
    "    else:\n",
    "        return ([],[])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Output CSV Files\n",
    "\n",
    "We will be persisting the results of this notebook as two `.csv` files in the `/arcgis/home` folder, which will then also publish to our portal.\n",
    "\n",
    "One `.csv` file (`ALL_URLS.csv`) will contain one row per URL. This file will contain an in-depth, comprehensive look of all secure/insecure URLs and how they are related to items. This file is best analyzed by filtering in desktop spreadsheet software, manipulating in a `pandas` DataFrame, etc.\n",
    "\n",
    "The other `.csv` file (`INSECURE_ITEMS.csv`) will contain one row per Item. This will be a useful, 'human-readable' table that will give us a quick insight into what items contain insecure URLs.\n",
    "\n",
    "Let's create a `create_csvs()` function that creates these files with the appropriate columns and unique filenames; it will be called on notebook start."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pathlib import Path"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [],
   "source": [
    "insecure_items_columns = ['item_id', 'item_title', 'item_url',\n",
    "                         'item_type', 'https_urls', 'http_urls']\n",
    "all_urls_columns = ['url', 'is_secure', 'item_id', \n",
    "                    'item_title', 'item_url', 'item_type']\n",
    "\n",
    "workspace = \"./arcgis/home\"\n",
    "\n",
    "current_time = time.time()\n",
    "formatted_time = time.strftime(\"%Y-%m-%d_%H-%M-%S\", time.localtime(current_time))\n",
    "\n",
    "\n",
    "if not os.path.exists(workspace):\n",
    "    os.makedirs(workspace)\n",
    "\n",
    "def create_csvs():\n",
    "    \"\"\"When called, will create the two output .csv files with unique \n",
    "    filenames. Returns a tuple of the string file paths\n",
    "    (all_urls_path, insecure_items_path)\n",
    "    \"\"\"\n",
    "    all_urls_path = f'{workspace}/ALL_URLs-{formatted_time}.csv'\n",
    "    insecure_items_path = f'{workspace}/INSECURE_ITEMS-{formatted_time}.csv'\n",
    "    for file_path, columns in [(all_urls_path, all_urls_columns),\n",
    "                   (insecure_items_path, insecure_items_columns)]:\n",
    "        with open(file_path, 'w') as file:\n",
    "            writer = csv.DictWriter(file, columns)\n",
    "            writer.writeheader()\n",
    "    return (all_urls_path, insecure_items_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that the `.csv` files have been made with the correct headers/columns, we can create a function to add a row to the `ALL_URLS.csv` file. Each URL gets its own row, an `is_secure` boolean, and information related to the item the URL came from (item id, item type, etc.)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "def write_row_to_urls_csv(url, is_secure, item, file_path):\n",
    "    \"\"\"Given any URL from an item we've tested, write a\n",
    "    row to the output 'ALL_URLs.csv', located at `file_path`. This .csv\n",
    "    will have one row per URL, with information such as an `is_secure`\n",
    "    boolean, information about the item that contained the URL, etc.\n",
    "    \"\"\"\n",
    "    with open(file_path, 'a') as file:\n",
    "        writer = csv.DictWriter(file, all_urls_columns)\n",
    "        writer.writerow({'url' : url,\n",
    "                         'is_secure' : is_secure,\n",
    "                         'item_id' : item.id,\n",
    "                         'item_title' : item.title,\n",
    "                         'item_url' : item.homepage,\n",
    "                         'item_type' : item.type})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we can create a function to add a row to the `INSECURE_ITEMS.csv` file. In this file, each Item gets its own row, with related information like its item id, item url, a JSON representation of the https_urls, a JSON representation of http_urls, etc."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "def write_row_to_insecure_csv(item, https_urls, http_urls, file_path):\n",
    "    \"\"\"Given an insecure item, write a row to the output \n",
    "    'INSECURE_URLS.csv' file, located at `file_path`. This .csv will \n",
    "    have one row per item, with information such as the item's ID,the \n",
    "    item's URL, a JSON representation of the list of http_urls and \n",
    "    https_urls, etc.\n",
    "    \"\"\"\n",
    "    with open(file_path, 'a') as file:\n",
    "        writer = csv.DictWriter(file, insecure_items_columns)\n",
    "        writer.writerow({'item_id' : item.id,\n",
    "                         'item_title' : item.title,\n",
    "                         'item_url' : item.homepage,\n",
    "                         'item_type' : item.type,\n",
    "                         'https_urls' : json.dumps(https_urls),\n",
    "                         'http_urls' : json.dumps(http_urls)})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Miscellaneous Functionality\n",
    "\n",
    "Another way we can persist the results from this notebook is to attempt to add a tag of `potentially_insecure` to all the insecure items we find via this function.\n",
    "\n",
    "> __Note__: An exception will NOT be thrown if an item's tag cannot be updated due to permissions, not being the item owner, etc. A warning message will be logged, but the function will return and the notebook will continue."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "def try_tag_item_as_insecure(item):\n",
    "    \"\"\"Will attempt to add a tag to the item that will mark it as \n",
    "    potentially insecure. If the tag cannot be updated (permissions,\n",
    "    not the owner, etc.), this function will still return, but it\n",
    "    will print out a WARNING message\n",
    "    \"\"\"\n",
    "    try:\n",
    "        tag_to_add = \"potentially_insecure\"\n",
    "        if tag_to_add not in item.tags:\n",
    "            item.update({'tags': item.tags + [tag_to_add]})\n",
    "    except Exception as e:\n",
    "        log.warning(f\"Could not tag item {item.id} as '{tag_to_add}'...\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, let's create a generator function that will `yield` `Item`(s). This notebook can run against all items in an organization or portal, or all items from certain groups, depending on the value of the previously defined configuration variables."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_items_to_check():\n",
    "    \"\"\"Generator function that will yield Items depending on how you \n",
    "    configured your notebook. Will either yield every item in an \n",
    "    organization, or will yield items in specific groups.\n",
    "    \"\"\"\n",
    "    if CHECK_ALL_ITEMS:\n",
    "        for user in gis.users.search():\n",
    "            for item in user.items(max_items=999999999):\n",
    "                # For the user's root folder\n",
    "                yield item\n",
    "            for folder in user.folders:\n",
    "                # For all the user's other folders\n",
    "                for item in user.items(folder, max_items=999999999):\n",
    "                    yield item\n",
    "    else:\n",
    "        for group_name in CHECK_THESE_GROUPS:\n",
    "            group = gis.groups.search(f\"title: {group_name}\")[0]\n",
    "            for item in group.content():\n",
    "                yield item"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## main()\n",
    "\n",
    "Finally, let's create our `main()` function that links together all our previously defined functions that get all our web maps, web scenes, and apps, test the items, and write the results to the correct places."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# After running main(), these in-memory variables will be populated\n",
    "secure_items = []\n",
    "insecure_items = []\n",
    "all_urls_csv_item = None\n",
    "insecure_items_csv_item = None\n",
    "\n",
    "def main():\n",
    "    # Tell user we're running, initialize variables/files\n",
    "    print(\"Notebook is now running, please wait...\\n-----\")\n",
    "    global secure_items, insecure_items, \\\n",
    "        all_urls_csv_item, insecure_items_csv_item\n",
    "    secure_items = []\n",
    "    insecure_items = []\n",
    "    all_urls_path, insecure_items_path = create_csvs()\n",
    "    \n",
    "    # Test each item, write to the appropriate file\n",
    "    for item in get_items_to_check():\n",
    "        try:\n",
    "            https_urls, http_urls = test_https_for(item)\n",
    "\n",
    "            # add all the item's URLs to the 'ALL_URLs.csv' output file\n",
    "            for urls, is_secure in [(https_urls, True), (http_urls, False)]:\n",
    "                for url in urls:\n",
    "                    write_row_to_urls_csv(url, is_secure, \n",
    "                                          item, all_urls_path)\n",
    "\n",
    "            # If the item is insecure, add to 'INSECURE_ITEMS.csv' out file\n",
    "            if http_urls:\n",
    "                insecure_items.append(item)\n",
    "                write_row_to_insecure_csv(item, https_urls, http_urls,\n",
    "                                          insecure_items_path)\n",
    "                if TRY_TAG_INSECURE_ITEMS:\n",
    "                    try_tag_item_as_insecure(item)\n",
    "            elif https_urls:\n",
    "                secure_items.append(item)\n",
    "        except:\n",
    "            print(f' unable to process {item}')\n",
    "            pass\n",
    "\n",
    "    # Publish the csv files, display them in the notebook\n",
    "    display(HTML(\"<h1><u>RESULTS</u><h1>\"))\n",
    "    all_urls_csv_item = gis.content.add({}, all_urls_path)\n",
    "    display(all_urls_csv_item)\n",
    "    insecure_items_csv_item = gis.content.add({}, insecure_items_path)\n",
    "    display(insecure_items_csv_item)\n",
    "\n",
    "    # Display the items with insecure URLs\n",
    "    max_num_items_to_display = 10\n",
    "    display(HTML(f\"<h3>{len(insecure_items)} ITEMS \"\\\n",
    "                 \"USE INSECURE URLs</h3>\"))\n",
    "    for item in insecure_items[0:max_num_items_to_display]:\n",
    "        display(item)\n",
    "\n",
    "    # Tell user we're finished\n",
    "    print(\"-----\\nNotebook completed running.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We have just defined a `main()` function, but we haven't called it yet. If you've modified the notebook, follow these steps:\n",
    "1. __Double check the notebook content__. Make sure no secrets are visible in the notebook, delete unused code, refactor, etc.\n",
    "2. Save the notebook\n",
    "3. In the 'Kernel' menu, press 'Restart and Run All' to run the whole notebook from top to bottom\n",
    "\n",
    "Now, `main()` can be called."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Notebook is now running, please wait...\n",
      "-----\n",
      "item: -- <Item title:\"StreamOverlay178519_Buffer\" type:Feature Layer Collection owner:tk_geosaurus>\n",
      "item: -- <Item title:\"StreamOverlay178519_Buffer\" type:Service Definition owner:tk_geosaurus>\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<h1><u>RESULTS</u><h1>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div class=\"item_container\" style=\"height: auto; overflow: hidden; border: 1px solid #cfcfcf; border-radius: 2px; background: #f6fafa; line-height: 1.21429em; padding: 10px;\">\n",
       "                    <div class=\"item_left\" style=\"width: 210px; float: left;\">\n",
       "                       <a href='https://example.com/home/item.html?id=faa917e525f54ceba471555d9c3cdd9d' target='_blank'>\n",
       "                        <img src='http://static.arcgis.com/images/desktopapp.png' class=\"itemThumbnail\">\n",
       "                       </a>\n",
       "                    </div>\n",
       "\n",
       "                    <div class=\"item_right\"     style=\"float: none; width: auto; overflow: hidden;\">\n",
       "                        <a href='https://example.com/home/item.html?id=faa917e525f54ceba471555d9c3cdd9d' target='_blank'><b>ALL_URLs-2024-05-07_13-38-37</b>\n",
       "                        </a>\n",
       "                        <br/><img src='https://example.com/home/js/arcgisonline/img/item-types/datafiles16.svg' style=\"vertical-align:middle;\" width=16 height=16>CSV by tk_geosaurus\n",
       "                        <br/>Last Modified: May 07, 2024\n",
       "                        <br/>0 comments, 0 views\n",
       "                    </div>\n",
       "                </div>\n",
       "                "
      ],
      "text/plain": [
       "<Item title:\"ALL_URLs-2024-05-07_13-38-37\" type:CSV owner:tk_geosaurus>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div class=\"item_container\" style=\"height: auto; overflow: hidden; border: 1px solid #cfcfcf; border-radius: 2px; background: #f6fafa; line-height: 1.21429em; padding: 10px;\">\n",
       "                    <div class=\"item_left\" style=\"width: 210px; float: left;\">\n",
       "                       <a href='https://example.com/home/item.html?id=d8aa064b04e742b382eac38ca4cfd036' target='_blank'>\n",
       "                        <img src='http://static.arcgis.com/images/desktopapp.png' class=\"itemThumbnail\">\n",
       "                       </a>\n",
       "                    </div>\n",
       "\n",
       "                    <div class=\"item_right\"     style=\"float: none; width: auto; overflow: hidden;\">\n",
       "                        <a href='https://example.com/home/item.html?id=d8aa064b04e742b382eac38ca4cfd036' target='_blank'><b>INSECURE_ITEMS-2024-05-07_13-38-37</b>\n",
       "                        </a>\n",
       "                        <br/><img src='https://example.com/home/js/arcgisonline/img/item-types/datafiles16.svg' style=\"vertical-align:middle;\" width=16 height=16>CSV by tk_geosaurus\n",
       "                        <br/>Last Modified: May 07, 2024\n",
       "                        <br/>0 comments, 0 views\n",
       "                    </div>\n",
       "                </div>\n",
       "                "
      ],
      "text/plain": [
       "<Item title:\"INSECURE_ITEMS-2024-05-07_13-38-37\" type:CSV owner:tk_geosaurus>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<h3>0 ITEMS USE INSECURE URLs</h3>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "-----\n",
      "Notebook completed running.\n"
     ]
    }
   ],
   "source": [
    "main()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If configured correctly, this notebook should have output two `.csv` files that can help you identify items that use insecure URLs."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Post Processing\n",
    "\n",
    "The `ALL_URLS.csv` file/item contains an in-depth, comprehensive look at all secure and insecure URLs and how they relate to items. This file contains a lot of information, which can be better analyzed using the `pandas` package. This code cell will convert any `.csv` Item to a pandas `DataFrame`; we will be converting the `ALL_URLS.csv` file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>url</th>\n",
       "      <th>is_secure</th>\n",
       "      <th>item_id</th>\n",
       "      <th>item_title</th>\n",
       "      <th>item_url</th>\n",
       "      <th>item_type</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>https://example.com/ArcGIS/rest/...</td>\n",
       "      <td>True</td>\n",
       "      <td>5d911425a8044bd49f75df77097cc9ea</td>\n",
       "      <td>Python API - Hub demo site</td>\n",
       "      <td>https://example.com/home/item.ht...</td>\n",
       "      <td>Hub Site Application</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>https://example.com/geohub-assets/templat...</td>\n",
       "      <td>True</td>\n",
       "      <td>5d911425a8044bd49f75df77097cc9e9</td>\n",
       "      <td>Python API - Hub demo site</td>\n",
       "      <td>https://example.com/home/item.ht...</td>\n",
       "      <td>Hub Site Application</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>https://example.com/geohub-assets/templat...</td>\n",
       "      <td>True</td>\n",
       "      <td>5d911425a8044bd49f75df77097cc9e0</td>\n",
       "      <td>Python API - Hub demo site</td>\n",
       "      <td>https://example.com/home/item.ht...</td>\n",
       "      <td>Hub Site Application</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>https://example.com/sharing/rest...</td>\n",
       "      <td>True</td>\n",
       "      <td>5d911425a8044bd49f75df77097cc9e8</td>\n",
       "      <td>Python API - Hub demo site</td>\n",
       "      <td>https://example.com/home/item.ht...</td>\n",
       "      <td>Hub Site Application</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>https://developers.arcgis.com/python/</td>\n",
       "      <td>True</td>\n",
       "      <td>5d911425a8044bd49f75df77097cc9e7</td>\n",
       "      <td>Python API - Hub demo site</td>\n",
       "      <td>https://example.com/home/item.ht...</td>\n",
       "      <td>Hub Site Application</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                 url  is_secure  \\\n",
       "0  https://services.arcgisonline.com/ArcGIS/rest/...       True   \n",
       "1  https://s3.amazonaws.com/geohub-assets/templat...       True   \n",
       "2  https://s3.amazonaws.com/geohub-assets/templat...       True   \n",
       "3  https://geosaurus.maps.arcgis.com/sharing/rest...       True   \n",
       "4              https://developers.arcgis.com/python/       True   \n",
       "\n",
       "                            item_id                  item_title  \\\n",
       "0  5d911425a8044bd49f75df77097cc9ea  Python API - Hub demo site   \n",
       "1  5d911425a8044bd49f75df77097cc9ea  Python API - Hub demo site   \n",
       "2  5d911425a8044bd49f75df77097cc9ea  Python API - Hub demo site   \n",
       "3  5d911425a8044bd49f75df77097cc9ea  Python API - Hub demo site   \n",
       "4  5d911425a8044bd49f75df77097cc9ea  Python API - Hub demo site   \n",
       "\n",
       "                                            item_url             item_type  \n",
       "0  https://geosaurus.maps.arcgis.com/home/item.ht...  Hub Site Application  \n",
       "1  https://geosaurus.maps.arcgis.com/home/item.ht...  Hub Site Application  \n",
       "2  https://geosaurus.maps.arcgis.com/home/item.ht...  Hub Site Application  \n",
       "3  https://geosaurus.maps.arcgis.com/home/item.ht...  Hub Site Application  \n",
       "4  https://geosaurus.maps.arcgis.com/home/item.ht...  Hub Site Application  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "def csv_item_to_dataframe(item):\n",
    "    \"\"\"Takes in an Item instance of a `.csv` file,\n",
    "    returns a pandas DataFrame\n",
    "    \"\"\"\n",
    "    if item is not None:\n",
    "        downloaded_csv_file_path = item.download()\n",
    "        return pandas.read_csv(downloaded_csv_file_path)\n",
    "    else:\n",
    "        print(\"csv item not downloaded\")\n",
    "        return None\n",
    "\n",
    "df = csv_item_to_dataframe(all_urls_csv_item)\n",
    "if df is not None:\n",
    "    display(df.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that you have a pandas `DataFrame` instance, you can run `query()` on it"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>url</th>\n",
       "      <th>is_secure</th>\n",
       "      <th>item_id</th>\n",
       "      <th>item_title</th>\n",
       "      <th>item_url</th>\n",
       "      <th>item_type</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>182</th>\n",
       "      <td>http://example.com/wms</td>\n",
       "      <td>False</td>\n",
       "      <td>10c4a93826d6421baf8b9ec8f92dd737</td>\n",
       "      <td>asdf</td>\n",
       "      <td>https://example.com/home/item.ht...</td>\n",
       "      <td>Web Map</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>184</th>\n",
       "      <td>http://example.com/wms</td>\n",
       "      <td>False</td>\n",
       "      <td>55497cebb4784fb19504080dfa44309h</td>\n",
       "      <td>frommapviewer2</td>\n",
       "      <td>https://example.com/home/item.ht...</td>\n",
       "      <td>Web Map</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>186</th>\n",
       "      <td>http://example.com/wms</td>\n",
       "      <td>False</td>\n",
       "      <td>a3bc943978844f5d853c4d61728f26a7</td>\n",
       "      <td>asdf</td>\n",
       "      <td>https://example.com/home/item.ht...</td>\n",
       "      <td>Web Map</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>188</th>\n",
       "      <td>http://example.com/wms</td>\n",
       "      <td>False</td>\n",
       "      <td>954c4c4e58f841299d4046df0dcd5104</td>\n",
       "      <td>asdf</td>\n",
       "      <td>https://example.com/home/item.ht...</td>\n",
       "      <td>Web Map</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>190</th>\n",
       "      <td>http://example.com/wms</td>\n",
       "      <td>False</td>\n",
       "      <td>1f8fea087bc946139cdc4dace1180249</td>\n",
       "      <td>asdf</td>\n",
       "      <td>https://example.com/home/item.ht...</td>\n",
       "      <td>Web Map</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                url  is_secure  \\\n",
       "182  http://wms.chartbundle.com/wms      False   \n",
       "184  http://wms.chartbundle.com/wms      False   \n",
       "186  http://wms.chartbundle.com/wms      False   \n",
       "188  http://wms.chartbundle.com/wms      False   \n",
       "190  http://wms.chartbundle.com/wms      False   \n",
       "\n",
       "                              item_id      item_title  \\\n",
       "182  10c4a93826d6421baf8b9ec8f92dd736            asdf   \n",
       "184  55497cebb4784fb19504080dfa44309a  frommapviewer2   \n",
       "186  a3bc943978844f5d853c4d61728f26a6            asdf   \n",
       "188  954c4c4e58f841299d4046df0dcd5101            asdf   \n",
       "190  1f8fea087bc946139cdc4dace1180240            asdf   \n",
       "\n",
       "                                              item_url item_type  \n",
       "182  https://geosaurus.maps.arcgis.com/home/item.ht...   Web Map  \n",
       "184  https://geosaurus.maps.arcgis.com/home/item.ht...   Web Map  \n",
       "186  https://geosaurus.maps.arcgis.com/home/item.ht...   Web Map  \n",
       "188  https://geosaurus.maps.arcgis.com/home/item.ht...   Web Map  \n",
       "190  https://geosaurus.maps.arcgis.com/home/item.ht...   Web Map  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "if df is not None:\n",
    "    display(df.query(\"is_secure == False\").head())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "as well as use any of the other powerful pandas functionality to gain more insight into the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      5d911425a8044bd49f75df77097cc9e9\n",
       "5      8ec563a6886f474c8d991e7748ab4c05\n",
       "16     cd64fa448d7645849a2e624ff56fc15d\n",
       "26     43f9a76b53054b8d9c0bb5b887744c5g\n",
       "29     e43a6ca6678d4288b5947bea032d5462\n",
       "                     ...               \n",
       "483    6b6673c0160e45599fef82ae80f357eh\n",
       "496    25181cd7a0c9411f9a7e7aec410b39f4\n",
       "504    badb4e6fa24e42d88742e8ae154315b3\n",
       "510    6daf908e54e5417eb9e7929d06ecbe18\n",
       "512    ce3d58cace8a49219190c94cf0908f66\n",
       "Name: item_id, Length: 190, dtype: object"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "if df is not None:\n",
    "    display(df.query(\"is_secure == True\")['item_id'].drop_duplicates())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Conclusion\n",
    "\n",
    "This notebook provided the workflow for identifying WebMap/WebScene/App Items that use insecure URLs and placed the results in two output `.csv` files. This notebook can be a powerful administrative tool to help you increase the security of your maps and apps. As the saying goes: \"Security is always excessive until it's not enough\".\n",
    "\n",
    "### Rewrite this Notebook\n",
    "\n",
    "This notebook can be rewritten to solve related problems. One of these problems is to identify WebMaps/WebScenes/Apps that contain services from an old ArcGIS Server that you are planning to turn off. Replace the `is_http()` and `is_https()` functions with something like:\n",
    "\n",
    "```python\n",
    "def is_from_domain(url):\n",
    "    return 'old-arcgis-server-domain.com' in url\n",
    "```\n",
    "\n",
    "You can then use a lot of the remaining functionality of this notebook to check to make sure that your items would not be affected by turning off the old ArcGIS Server.\n",
    "\n"
   ]
  }
 ],
 "metadata": {
  "esriNotebookRuntime": {
   "notebookRuntimeName": "ArcGIS Notebook Python 3 Standard",
   "notebookRuntimeVersion": "9.0"
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.5"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": false,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "**Table of Contents**",
   "title_sidebar": "Contents",
   "toc_cell": true,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
